20 research outputs found
Few-shot classification in Named Entity Recognition Task
For many natural language processing (NLP) tasks the amount of annotated data
is limited. This urges a need to apply semi-supervised learning techniques,
such as transfer learning or meta-learning. In this work we tackle Named Entity
Recognition (NER) task using Prototypical Network - a metric learning
technique. It learns intermediate representations of words which cluster well
into named entity classes. This property of the model allows classifying words
with extremely limited number of training examples, and can potentially be used
as a zero-shot learning method. By coupling this technique with transfer
learning we achieve well-performing classifiers trained on only 20 instances of
a target class.Comment: In proceedings of the 34th ACM/SIGAPP Symposium on Applied Computin
Distribution-Free Statistical Dispersion Control for Societal Applications
Explicit finite-sample statistical guarantees on model performance are an
important ingredient in responsible machine learning. Previous work has focused
mainly on bounding either the expected loss of a predictor or the probability
that an individual prediction will incur a loss value in a specified range.
However, for many high-stakes applications, it is crucial to understand and
control the dispersion of a loss distribution, or the extent to which different
members of a population experience unequal effects of algorithmic decisions. We
initiate the study of distribution-free control of statistical dispersion
measures with societal implications and propose a simple yet flexible framework
that allows us to handle a much richer class of statistical functionals beyond
previous work. Our methods are verified through experiments in toxic comment
detection, medical imaging, and film recommendation.Comment: Accepted by NeurIPS as spotlight (top 3% among submissions
Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions
Rigorous guarantees about the performance of predictive algorithms are
necessary in order to ensure their responsible use. Previous work has largely
focused on bounding the expected loss of a predictor, but this is not
sufficient in many risk-sensitive applications where the distribution of errors
is important. In this work, we propose a flexible framework to produce a family
of bounds on quantiles of the loss distribution incurred by a predictor. Our
method takes advantage of the order statistics of the observed loss values
rather than relying on the sample mean alone. We show that a quantile is an
informative way of quantifying predictive performance, and that our framework
applies to a variety of quantile-based metrics, each targeting important
subsets of the data distribution. We analyze the theoretical properties of our
proposed method and demonstrate its ability to rigorously control loss
quantiles on several real-world datasets.Comment: 24 pages, 4 figures. Code is available at
https://github.com/jakesnell/quantile-risk-contro
Im-Promptu: In-Context Composition from Image Prompts
Large language models are few-shot learners that can solve diverse tasks from
a handful of demonstrations. This implicit understanding of tasks suggests that
the attention mechanisms over word tokens may play a role in analogical
reasoning. In this work, we investigate whether analogical reasoning can enable
in-context composition over composable elements of visual stimuli. First, we
introduce a suite of three benchmarks to test the generalization properties of
a visual in-context learner. We formalize the notion of an analogy-based
in-context learner and use it to design a meta-learning framework called
Im-Promptu. Whereas the requisite token granularity for language is well
established, the appropriate compositional granularity for enabling in-context
generalization in visual stimuli is usually unspecified. To this end, we use
Im-Promptu to train multiple agents with different levels of compositionality,
including vector representations, patch representations, and object slots. Our
experiments reveal tradeoffs between extrapolation abilities and the degree of
compositionality, with non-compositional representations extending learned
composition rules to unseen domains but performing poorly on combinatorial
tasks. Patch-based representations require patches to contain entire objects
for robust extrapolation. At the same time, object-centric tokenizers coupled
with a cross-attention module generate consistent and high-fidelity solutions,
with these inductive biases being particularly crucial for compositional
generalization. Lastly, we demonstrate a use case of Im-Promptu as an intuitive
programming interface for image generation
Few-Shot Attribute Learning
Semantic concepts are frequently defined by combinations of underlying
attributes. As mappings from attributes to classes are often simple,
attribute-based representations facilitate novel concept learning with zero or
few examples. A significant limitation of existing attribute-based learning
paradigms, such as zero-shot learning, is that the attributes are assumed to be
known and fixed. In this work we study the rapid learning of attributes that
were not previously labeled. Compared to standard few-shot learning of semantic
classes, in which novel classes may be defined by attributes that were relevant
at training time, learning new attributes imposes a stiffer challenge. We found
that supervised learning with training attributes does not generalize well to
new test attributes, whereas self-supervised pre-training brings significant
improvement. We further experimented with random splits of the attribute space
and found that predictability of test attributes provides an informative
estimate of a model's generalization ability.Comment: Technical report, 25 page